Okay perfect. I can also see the chat.
Okay, our topic for today is twofold. We have for one linear regression.
So there, like regression, we had last exercise, the programming exercise and the regression.
What you basically want to do is you have a real number R or maybe it's also like a vector of real numbers
and you want to predict either a real number Y or also a vector of real numbers.
And in our case we will do it only in 1T to 1D so X will be in one dimensional in our example and Y will also be one dimensional.
So our goal is basically we have some data points which are just some events from the past.
And what we would like to do is we would like to derive from these events from the past a model that has as few parameters as possible
because probably our data points from the past have some noise on their position
and if we use as few parameters as possible then we have higher chances that our training data is sufficient
to properly derive a function that is able to predict events in the future.
So what we want to do is we want to be able that when someone gives us an X we want to be able to use a model to derive from that X value Y.
And in the programming exercise we used for this model, so this is our model, we used a simple line equation.
So our f of theta and X had this form here.
And what we tried to do is like our model we could choose arbitrary thetas for our model
but we tried to do it in a way that it fitted best as close or that it resembled as close as possible our training data.
And the second topic for today is then a multilevel perceptron where we will have a look at neural networks.
So which activation functions we should use for them and what properties do they have.
So first linear regression, this is basically the thing that you also should do for the programming exercise.
In the programming exercise we are trying to do exactly that, that we had a line which resembled our model
and we tried to fit the line in a manner that minimized the distance to our training points.
Yeah, like they minimized these arrows that we can measure here.
We had different ways to measure our arrows, like one was like the L2 norm where we just used each of those arrows.
We took the L2 norms of each of those which were because we were in 2D just squaring and the absolute value of this distance
and then summing all these distances up.
And what we also used like an alternative norm which was the Huber norm and there we just used a different function that we applied on those values.
So there we used this distance here, the distance r and applied the Huber function on the distance
which in some way gave those distances a different weight.
And yeah, depending on which loss function we chose we had different results for our model and depending on how we measure like the arrows
we will get a different best solution.
Is the session recorded? Yes, I pressed on record right now.
So yeah, it should be available.
Let me check that. It should be recorded.
It's recorded, yeah.
And we saw that the L2 norm was more sensitive to outliers like this point and it tried to minimize, to avoid big errors like this one here.
And yeah, in a mathematical notation, this would be our data and this term here would be our model.
And yeah, like an arrow here, so this should be a plus.
I don't know why I didn't fix that.
And yeah, like what we try to do is this is the distance, the disagreement between our data and our model
and we plug it into a norm and then we sum up over all these arrows.
And in this exercise we want to have a look at the Huber norm.
For the L2 norm we can use regular least square to determine the optimal solution for this optimization problem.
For the Huber norm there was no direct solution and so we need to derive the gradient of this objective function.
And this is what we will do in our first task for the day.
Yeah, so this is the objective function, the total objective function and this is like the definition of the Huber function for today
because I will get easily confused with all this theta and indices.
We just switched to different letters so we will use A and B for the parameters for the day.
But yeah, this will change nothing in the principle that we will use.
So if we want to use like gradient descent method we need to get, derive the gradient of this objective function
in order to minimize our loss and get a better model in an iterative manner.
Zugänglich über
Offener Zugang
Dauer
01:17:34 Min
Aufnahmedatum
2020-12-23
Hochgeladen am
2020-12-30 08:18:52
Sprache
en-US